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ABSTRACT 

Before it becomes a sensitive probe of the Epoch of Reionization, the Dark Ages, and 
fundamental physics, 21 cm tomography must successfully contend with the issue of 
foreground contamination. Broadband foreground sources are expected to be roughly 
four orders of magnitude larger than any cosmological signals, so precise foreground 
models will be necessary. Such foreground models often contain a large number of 
parameters, reflecting the complicated physics that governs foreground sources. In 
this paper, we concentrate on spectral modeling (neglecting, for instance, bright point 
source removal from spatial maps) and show that 21 cm tomography experiments will 
likely not be able to measure these parameters without large degeneracies, simply 
because the foreground spectra are so featureless and generic. However, we show that 
this is also an advantage, because it means that the foregrounds can be characterized 
to a high degree of accuracy once a small number of parameters (likely three or four, 
depending on one's instrumental specifications) are measured. This provides a simple 
understanding for why 21 cm foreground subtraction schemes are able to remove most 
of the contaminants by suppressing just a small handful of simple spectral forms. 
In addition, this suggests that the foreground modeling process should be relatively 
simple and will likely not be an impediment to the foreground subtraction schemes 
that are necessary for a successful 21cm tomography experiment. 

Key words: Cosmology: Early Universe - Radio Lines: General - Techniques: Inter- 
ferometric - Methods: Data Analysis 



1 INTRODUCTION 

In coming years, 21cm tomography has the potential to 
vastly expand our knowledge of both astrophysics and cos- 
mology, with its unique ability to be a direct probe of the 
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cosmological parameters at an unprecedented level of accu- 
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First steps towards realizing this potential are being taken 
with the construction, testing, and initial science runs of 



low- frequency radio arrays such as PAPER (Parsons et al 
2010] ) LOFAR (jGarrett|2009| ), MW A (jLonsdale et al.|2009fc 
CRT ( jPeterson et al.|2006|>, GMRT (jPaciga et al |201l| 7and 



the Omniscope (Tegmark & Zaldarriaga 20101. However, 
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many obstacles still need to be overcome before cosmologi- 
cal measurements can be performed on the resulting data. 
For instance, foreground contamination (both from within 
our galaxy and from extragalactic sources) is expected to 
be at least four orders of magnitude brighter than the ex- 
pected cosmological signal ( |de Oliveira-Costa et al.|2008[ ), so 
any viable data analysis scheme must also contain a robust 
foreground subtraction algorithm. 

Foreground subtraction is a problem that has been stud- 
ied extensively in the literature, and many schemes have 
been proposed for tackling the issue. Early ideas focused on 
using the angular structure of the foregrounds to separate 



them from the cosmological signal (Di Matteo et al 

2005| |Zaldarriaga 



2004 



2004 
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, while most recent proposals focus on line-of-sight (i.e. 



spectral) information I 
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Petrovic & Oh||2011|). By 



making use of spectral information, these proposals take ad- 
vantage of the extremely high spectral resolution available 
in all 21 cm experiments, and indeed it was shown in |Liu| 



& Tegmark (20111 that because of the nature of the fore- 
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grounds and the instrumental parameters, an optimal esti- 
mation of the power spectrum should involve a foreground 
subtraction scheme that operates primarily using frequency 
information. 

Most line-of-sight proposals so far have been blind 
scheme^Jin the sense that they do not require any prior fore- 
ground modeling. All such proposals take advantage of the 
smooth nature of the foreground spectra, and separate out 
the rapidly fluctuating cosmological signal by (for instance) 
subtracting off a predetermined set of low-order polynomi- 
als ( Bowman et al.|2009 Liu et al. 2009a b) or by imposing 



of foreground contamination ( Shaver et al.|19 99 Wang et al. 
20061: 



a predetermined filter in Fourier space ( Paciga et al.pOlT 
Petrovic & Oh 2011 1. The blind nature of these schemes may 



seem at first to be an advantage, because the low frequency 
(~ 50 to 300 MHz) regime of radio foregrounds is as yet 
fairly unconstrained observationally and most models are 
based on extrapolations and interpolations from other fre- 
quencies, where the instruments are optimized for different 
science goals. However, even if our foreground models are not 
entirely accurate initially, a non-blind scheme can always be 
performed iteratively until the models converge to the true 
measured foregrounds. Moreover, blind schemes do not im- 
prove as one makes better and better measurements of the 
foregrounds, whereas non-blind schemes will continually im- 
prove as measurements place increasingly strong constraints 
on foreground models. In this paper we therefore examine 
the foreground modeling process, and determine whether or 
not it will be possible to construct a foreground model that 
is good enough for foreground subtraction purposes if only a 
small number of independent parameters can be measured. 
In the spirit of "one scientist's noise is another scientist's 
signal", we also quantify the ability of 21 cm experiments to 
place constraints on phenomenological foreground parame- 
ters. 

The rest of this paper is organized as follows. In Sec- 
tion [2] we introduce our foreground model, and we show in 
Section [3] that the foregrounds can be described to an ex- 
tremely high precision using just three parameters. This is 
good news for 21 cm cosmology, for it implies that by mea- 
suring just a small handful of empirical parameters, it is 
possible to construct a foreground model that is sufficiently 
accurate for high precision foreground subtraction. However, 
this also implies that the large number of (physically moti- 
vated) parameters present in a typical foreground param- 
eterization are redundant and highly degenerate with each 
other, so the prospects for using 21 cm experiments to gain a 
detailed understanding of foreground physics are bleak. We 
show this in Section [1J and summarize our conclusions in 
Section [5] 



2 FOREGROUND AND NOISE MODEL 

For the purposes of this paper, we limit our analyses to 
foregrounds in the frequency range 100 to 250 MHz, which 
roughly speaking covers the "sweet spots" of most 21 cm ex- 
periments that are designed to probe the Epoch of Reioniza- 
tion. At these frequencies, there are three dominant sources 



(i) Extragalactic point sources. 

(ii) Synchrotron emission from our Galaxy. 

(iii) Free-free emission from our Galaxy. 

Since foreground subtraction is best done using spectral in- 
formation (Liu & Tegmark 2 011| ), we will ignore the spatial 
structure of these foregrounds for this paper, as well as any 
polarization structura^l We now examine the spectra of each 
of these sources, and in particular compute their means and 
variances, which we define in the next section. With some 
minor modifications, our models are essentially those of Liu 
& TegmarklpOlTl. 



2.1 Definition of the mean and the foreground 
covariance 

Consider a multifrequency sky map of brightness tempera- 
ture T from a typical 21cm tomography experiment. The 
map T(f , v) is a function of the direction in the sky f as 
well as the frequency v. We define a random function x[y) 
such that 



c(v) = T(*,v) 



(1) 



is the spectrum measured at a randomly chosen pixel cen- 
tered at f . Since the cosmological signal is expected to be 
dwarfed by the foregrounds, this spectrum will essentially 
be a spectrum of the foregrounds. 

To quantify the statistical properties of this random 
function, we can calculate its mean and covariance. In princi- 
ple, doing so requires taking the ensemble averages. In prac- 
tice, we instead take averages of the pixels of the sky map. 
The mean is thus given by 



m{y) = {x{v)) 



T(r,u)dQ, 



(2) 



where f2 is the total area covered by the sky marj^J The 
statistical covariance can be similarly defined in the usual 
way: 

C{v,v) = (x(v)x(y)) — m(y)m[y), (3) 



where 



{x[u)x{v')) = ~f T{v,u)T{v,u')dn. (A) 



2.2 Extragalactic point sources 

Extragalactic point sources can be thought of as consisting 
of two populations. The first consists of bright, isolated point 
sources that can be resolved by one's instrument. Following 
previous foreground studies, we assume that these bright 
point sources have already been removed, prior to the main 



1 See Liu & Tegmark (2011 1 for an example that is not blind. 



2 See |de Oliveira-Costa et al.|(|2008[) for a dis cussion of the spatial 
structure of foregrounds, and Bcrnardi et al. (2010) and G eil et al.| 
( 2010j l for discussions of polarized foregrounds and their removal. 

3 Note that this sky map need not cover all An steradians of the 
full sky. For instance, in an estimation of the power spectrum one 
may choose to use only the data from the cleanest parts of the 
sky. 
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foreground cleaning step that attempts to subtract off the 
rest of the foregrounds. Techniques such as forward modeling 



(Bernardi et al. 20111 and peeling have been explored for 
this purpose, and peeling simulations suggest that the bright 
sources can be removed down to S ma x ~ 10 to 100 mjy 
( Pindor et al.|20TT l. To be conservative we will USe Smax — 
100 mjy in all calculations that follow. 

Below Smax is the second population of extragalactic 
point sources, consisting of a "confused" continuum of unre- 
solved point sources. Along a given line-of-sight, we imagine 
the number of sources to be Poisson distributed with an av- 
erage of nQ P ix sources, where n is the number of sources per 
steradian and Sl p i x is the pixel size. We model the spectrum 
of each source as a power law with a random spectral index 
a drawn from a Gaussian distribution 

\2" 



To compute the covariance of the distribution, we once 
again begin by considering a population of point sources 
of brightness S* whose number density is determined by 
Poisson statistics (thus giving a result proportional to nS*). 
We then integrate over the source count distribution to get 



dS* 



= (4274x 3 malK 2 ) 



1.25 



10- 6 Sr 



B 



4.0mJy- 1 Sr- 1 



p(a) 



-. exp 



(a- 



- ceo) 



2al 



(•») 



where ao is the mean spectral index (with its numerical 
value to be determined later) and a a — 0.5 (Tegmark et al. 



2000 1 . If all of the unresolved point sources had the same 
flux S* at some fiducial frequency v, = 150 MHz, the mean 
intensity would be 



m ps (v) 



(nn pix S*) / — p(a)da, (6) 



Note that no further subtraction of the mean term is neces- 
sary, since it was implicitly accomplished when we invoked 
Poisson statistics. 



2.3 Galactic Synchrotron Radiation 

For Galactic synchrotron radiation, we imagine the fore- 
ground spectrum in each pixel to be well fit by a power 
law with spectral index a, but that the value of the spectral 
index may vary from pixel to pixel. In a given pixel, the 
spectrum is thus 



where the quantity A v /Q. P i x converts our expression from 
having flux units to temperature units, and is given by 



&\V) — A s yn C ( 



(11) 



^r>ix 



A 



2/c Qpix 



1.4x10" 



n 



pix 

1 sr 



mjy 1 K, where A a 



335.4 K (Wang et al. 20061. Similar to the 



(7) 

where A is the wavelength, ks is Boltzmann's constant, and 
flpix is the pixel solid angle. Of course, in reality the sources 
do not all have the same flux. To take this into account, we 
extrapolate a source count distribution from an empirical 
study done at higher flux levels: 



point sources, we assume that the indices in different pix- 
els to be Gaussian distributed, only this time with a mean 
of ctsync = 2.8 and a standard deviation of Aa sync = 0.1 



I Wang et aL|2006| ). Performing the same integral as for the 
point sources, we obtain 



civ 
dSt 



= B 



S, 
880 mjy 



(8) 



A s 



(12) 



where following Di Matteo et al. ( |2002[ ) we take B — 
4.0mJy- 1 Sr- 1 and 7 = 1.75 as our fiducial valueiQ Inte- 
grating over the distribution, Equation [6] becomes 



Forming the foreground covariance using Equation [3j we ob- 
tain 



m pa (v) 



(17 ' Ax max K) 



B 



s~isync / l\ 



A: 



0.25 



4.0mJy- 1 Sr- 1 



X 



sync~\ 



ync 

\ ^* / 

sync / \ sync / l\ 

-m {v)m y [y ). 



(13) 



(9) 



where x m ax = S ma x/880 mjy, and a pa = Qo + 2 takes on a 
fiducial value of 2.5 to match measurements from the Cos- 
mic Microwave Background (Teg mark et al.|2000 1. Note also 
that this implies Qq « 0.5, which is consistent with both |To:F] 



2.4 Free-free Emission 

Free-free emission can be modeled in much the same way as 
the synchrotron radiation: 



folatti et al. ( 1998 1 and Jackson ( 2005 1 



m ss (y)= A ff 



(14) 



4 Despite the fact that the source count distribution was obtained 
by extrapolation, we expect that whatever the true distribution 
is, it should be well-approximated by a power law. This is be- 
cause the key quantity is the integral of the source count, which 
is dominated by the brightest sources of the population. In that 
regime, we can linearize the distribution in log-log space, giving 
a power law in S* . 



C lf {u,u')= A 



2 / vv 



If \ 



(15) 



but with A f f = 33.5 K, a ff = 2.15, and Aa ff = 0.01 ( |Wang| 
et al.|2006| |. 
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Table 1. Free parameters in our foreground model and their fidu- 
cial values. 



Parameter 


Description 


Fiducial 






Value 


B 


Source count normalization 


4.0 mjy- 1 Sr" 1 


7 


Source count power-law index 


1.75 




Point source spectral index 


2.5 


CT a 


Point source index spread 


0.5 


A S y n c 


Synchrotron amplitude 


335.4 K 


CX-sync 


Synchrotron spectral index 


2.8 




Synchrotron index coherence 


0.1 


Aff 


Free-free amplitude 


33.5 K 


a ff 


Free-free spectral index 


2.15 


Aa ff 


Free-free index coherence 


0.01 



2.5 Total Foreground Contribution 

To obtain total contribution to the mean signal, we simply 
sum the means of the various components: 

m(v) = m pa (v) + m sync (v) + m ss (y). (16) 

For the total covariance we sum the individual covariances: 

C(v,v') = C ps (v,v') + C sync (v,v') + C ff (v,v'). (17) 

In total, then, our foreground model consists of 10 free pa- 
rameters, which are listed in Table[l]along with their fiducial 
values. 



2.6 Noise model 

In general, the noise level in a given pixel of a sky map 
produced by a radio telescope/interferometer scales as 



\ 2 T S , 



A e \/AvAt 



(18) 



where Av is the channel width of a frequency bin, At is 
the integration time, A e is the collecting area of an antenna 
element, T sys is the system temperature, and A is the wave- 
length. To keep the discussion in this paper as general pos- 
sible, we do not explicitly model the effective area and the 
system temperature, since both depend on one's specific in- 
strument in complicated ways. Instead, we take a minimal- 
istic approach and simply anchor our noise to levels that 
are expected for current generation 21 cm tomography ex- 



periments. From Bowman et al. (20091, we know that with 



360 hours of integration at a channel width of 40 KHz, the 
MWA has a single pixel noise level at 158 MHz that is ap- 
proximately 330 mK. Scaling this to our fiducial values, we 
obtain 



(39 mK) 



1MHz 
Av 



( 1000 Ins 



At 



(19) 



In the numerical studies conducted in this paper, we will 
always be keeping At fixed at 1000 hrs. The frequency bin 
width Av will also be fixed at its fiducial value of 1 MHz, ex- 
cept when explicitly stated (for instance when we investigate 
the dependence of our results on Av). 

In the next section, we will be working almost exclu- 
sively in dimensionless units where the foreground power 



(quantified by the diagonal v = v' elements of the fore- 
ground covariance) is equal to unity across all frequencies 



(see Equation 22 1. In such units, the noise can be taken to 
be approximately white i.e. frequency independent. To see 
this, note that the effective area A e of an antenna scales as 
A 2 , which means that the quantity A 2 /A e has no frequency 
dependence, and o no i se depends on frequency only because 
of T sya . Typical 21 cm experiments are sky noise dominated 
(Morales & Wyithe 20101, which means that T sys should 



be dominated by the foreground temperature, and there- 
fore should have roughly the same frequency dependence 
as the foregrounds do. However, this frequency dependence 
was precisely the dependence that our choice of units was 
designed to null out. The noise thus becomes white to a 
good approximation, and is effectively a noise-to-signal ra- 
tio (henceforth denoted by k) . Note that while we adopt this 
approximation for the rest of the paper, it is by no means 
crucial, and in general one should always use units where the 
foreground covariance is white, even if in such units the noise 
is chromatic. For further discussion of this point, please see 
Section l3~2l 



3 THE EASE OF CHARACTERIZING 
FOREGROUNDS 

While the foreground parameters described in Table [I] are 
certainly conventional choices, they are not the most eco- 
nomical, in the sense that many of them are redundant. 
That this is the case is not surprising, since so many dif- 
ferent foreground sources can be accurately described by 
spectra that deviate only slightly from power laws. In Sec- 
tion |3 - 1 1 we will reparametrize our foreground model using 
a principal component analysis. In Section |3.4| we will de- 
termine the effective number of principal components that 
need to be measured to accurately describe the foreground 
spectra, and find the number to typically be three or four. 



3.1 Eigenforeground Modes 

Since a measured foreground spectrum will necessarily be 
discrete, we begin by discretizing the mean and the covari- 
ance of our foreground model from Section [2] so that 



and 



m(y a 



C a g = C(v a , Vg), 



(20) 



(21) 



where the indices run from 1 to N c , the total number of fre- 
quency channels in one's instrument. We define a correlation 
matrix 



R 



■a/3 



C a/ 3 



\J Cctct C 



(22) 



(3/3 



and work with it instead the covariance matrix. 

We now perform a principal component analysis on the 
foreground^] That is, we rewrite the correlation matrix in 

5 We perform the principal component analysis using the corre- 
lation matrix R instead of the covariance matrix C because the 
challenge with foreground modeling is to successfully describe the 
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a basis of "eigenforeground" vector^] where each eigenfore- 
ground vector v n satisfies the eigenvalue equation 



Rv n — AiV„. 



(23) 



Normalizing the eigenforeground vectors to unity and form- 
ing a matrix V where the columns of the matrix are the 
normalized eigenvectors, the correlation matrix can be ex- 
pressed as 



R = VAV' 



N c 

^ A„v„vl, 

n=l 



(24) 



where A = diag{Ai, A2, A3, . . . }. Note that since R is real 
and symmetric, V* = V" 1 , so VV* = V*V = I. In the last 
equality of Equation [24] we see that each eigenvalue A n mea- 
sures the contribution of its corresponding eigenforeground 
v n to the total foreground variance. 

In Section [3.4| we will seek to describe measured fore- 
ground spectra in terms of eigenforeground components. In 
other words, we wish to find the weight vector a (with com- 
ponent afe for the k th eigenforeground) in the equation 



y = x + n = 22 °* v fc 
fc=i 



n = Va + n, 



(25) 



where y, x, and n are all vectors of length equal to the 
number of channels N c , containing the measured spectrum 
of foregrounds, the true foregrounds, and the noise respec- 
tively. The vector x, for instance, is simply a discretized and 
whitened version of x(v) in Equation [I] Once an estimate a. 
of a has been determined, one can multiply by V to obtain 
an estimator x of the foreground spectrum. We will find that 
by measuring a small number of parameters, we can char- 
acterize foreground spectra to a very high precision, thanks 
to the special properties of the eigenforegrounds, which we 
describe in Section [3[2] 



3.2 Features of the Eigenforegrounds 

Since the foreground spectra do not contain any sharp fea- 
tures^] we expect their rather featureless frequency depen- 
dences to be well-described by a small number of eigenfore- 
grounds, i.e. we expect the eigenvalues A„ to be large only 
for the first few values of n. Performing the analysis using 
the foreground model of Section [2] for a noiseless experi- 
ment with 50 equally spaced frequency channels going from 
100 MHz to 200 MHz, we see in Figure[T]that the eigenvalues 
do fall off rapidly with mode number; indeed, the fall-off is 
exponential (a fact that we will explain later in this section), 



fine relative perturbations about the smooth predictable power 
law spectrum. Since the relative fluctuations are quantified by R, 
not C, we should correspondingly use R for our principal com- 
ponent analysis. 

6 Throughout this paper, we will use Greek indices to denote dif- 
ferent frequencies and lowercase Latin indices to denote different 
foreground components/modes. In Section [4] we will use upper- 
case Latin indices to denote the different foreground parameters 
listed in Ta ble [T] 

7 Following [Morales et aT] \2Q0&\ ; |Gleser et al.]j2008| , we assume 
that narrowband contaminants such as terrestrial radio stations 
and radio recombination lines have been excised from the data 
prior to this point in the analysis. 
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Figure 1. Eigenvalues of R with no noise for an experiment with 
50 frequency channels, equally spaced from 100 MHz to 200 MHz. 
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Figure 2. Eigenvalues of R for an experiment with 50 frequency 
channels, equally spaced from 100 MHz to 200 MHz, and noise 
levels given by the fiducial model of Section|2.6| 



and in going from the first eigenvalue to the third there is 
a drop of more than three orders of magnitude. This means 
that if we select as our foreground parameters the expansion 
coefficients of Equation |25| we can account for almost all 
of the foreground signal by measuring just a few numbers. 
In fact, in a realistic experiment it is impossible to measure 
more than the first few eigenvalues, as the foreground sig- 
nal quickly becomes subdominant to the instrumental noise. 
This can be seen in Figure [2] where we once again show the 
eigenvalues, but this time including the noise model of Sec- 
tion |2.6| After the fourth eigenforeground, we see that the 
eigenvalues hit a noise floor because by then the eigenmodes 
are essentially measuring the noise. Note also that the large 
drop in magnitude of successive eigenvalues makes the quali- 
tative results of this paper robust to our assumption that the 
noise is white in our non-dimensionalized units — because 
the various eigenmodes contribute to the total foreground at 
such different levels, a slight chromaticity in the noise will 
not change the mode number at which the eigenmodes hit 
the noise floor in Figure [2] Violating our assumption will 
thus have very little effect on our results. 

It should also be pointed out that in an analysis where 
the cosmological signal is included and is larger than the in- 
strumental noise (as would be the case for a long integration 
time experiment), one would expect to hit a "signal floor" 
rather than a noise floor. The break in the eigenvalue spec- 
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Figure 3. First few eigenvectors of R ( "eigenforegrounds" ) for 
an experiment spanning a frequency range from 100 MHz to 
200 MHz. 
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Figure 4. First few eigenvectors of R ("eigenforegrounds") for 
an experiment spanning a frequency range from 100 MHz to 
200 MHz, but with the C aa normalization factor restored so that 
the eigenforegrounds have units of temperature. 



trum in going from the exponential decay of the foreground 
dominated region to a flatter signal region can potentially be 
used as a diagnostic for separating foregrounds from signal. 
To properly investigate the robustness of this method, how- 
ever, requires detailed simulations of the signal. Since the 
focus of this paper is foreground modeling (and not signal 
extraction), we defer such an investigation to future work. 

In Figure [3] we show the first few foreground eigenvec- 
tors, and in Figure [4] we restore the frequency-dependent 
normalization factors that were divided out in Equations 
|22| In both cases the eigenvectors are the ones defined by 
Equation [23] i. e. for the noiseless case, and henceforth we 
will only be using these noiseless eigenvectors. Even though 
we will include noise in our subsequent analysis, the use of 
the noiseless eigenvectors represents no loss of generality be- 
cause the eigenvectors simply form a convenient set of basis 
vectors that span the space. In addition, with our assump- 



tion of white noise (in the units defined by Equation 221, 
the inclusion of noise alters only the eigenvalues, not the 
eigenvectors. 

Several eigenforeground features are immediately ap- 
parent from the plots. First, the n th eigenforeground is seen 
to have n — 1 nodes. This is due to the fact that correla- 
tion matrix R is Hermitian (since it is real and symmet- 
ric), so Equation 23 takes the mathematical form of a time- 
independent Schrodinger equation for a one-dimensional sys- 
tem, and the node theorem of quantum mechanics applies. 
As a consequence, each successive foreground eigenmode 
probes a more rapidly fluctuating component of the spec- 
trum, which explains the rapid fall in eigenvalues seen in 
Figure [I] — since foregrounds are such smooth functions of 
frequency, the more rapidly oscillating eigenmodes are sim- 
ply not required. 

Another characteristic feature of Figure [3] is that the 
n th eigenforeground appears to look like a polynomial of or- 
der (n — 1), albeit with some slight deviations (the "linear" 
1st eigenmode, for instance, has a small curvature to it). 
This approximate polynomial behavior explains the success 



of line-of-sight foreground subtraction schemes (Wang et al. 
[2006} |Bowman et al.|2009| |Liu et al |2009a|b| ) that subtract 
low-order polynomials from foreground spectra — by sub- 
tracting out low-order polynomials, one is subtracting the 
modes with the largest eigenvalues, and since the eigenval- 
ues fall so quickly (indeed, exponentially) with mode num- 
ber, the result is that most of the foregrounds are cleaned 
out by the process. Together, Figures [3] and [4] also explain 
why it was found in Liu et al. ( 2009b[ ) that polynomial fore- 



ground cleaning performs well only over narrow (~ 1 MHz) 
frequency ranges. In Figure [4] it is seen that over large fre- 
quency ranges the eigenforegrounds do not behave like poly- 
nomials, and that the polynomial behavior is only evident 
after dividing out a power-law-like normalization. In other 
words, the eigenforegrounds in Figure [4] still have a rough 
power law dependence to their spectrum, and thus are not 
well-fit by polynomials. Only if the frequency range is nar- 
row will the power law dependence have a negligible effect, 
and so only in such a scenario would polynomial subtrac- 
tion do well. Fortunately, there is a simple solution to this 
problem — if one wishes to perform polynomial subtraction 
over a wide frequency range, one can still do so successfully 
by first dividing out a fiducial spectrum (in the same way as 
we did in writing down Equation \22\ before performing the 
fit^ Put another way, we simply have to perform the fits 
in Figure [3] (where the modes always look like polynomials) 
rather than in Figure [4] 

3.3 Understanding the Eigenforegrounds 

Despite the fact that it can be useful (in our whitened units) 
to think of the eigenforegrounds as polynomials of succes- 
sively higher order, we caution that this interpretation is not 
exact. For instance, the first eigenmode in Figure[3]is clearly 
not precisely constant, and contains a small quadratic cor- 
rection. Indeed, we will now show that it is more useful to 
think of the whitened eigenmodes as functions that would be 
sinusoids in v were it not for an instrument's finite frequency 
range and spectral resolution. 

Suppose that unresolved point sources were the sole 
contributor to our measured foregrounds. Working in a con- 
tinuous description (to avoid fin ite bandwidth and resolu- 
tion) , the analog of Equation 22 takes the forrrj^] 

R(v, v 



C{v, u') 



(26) 



8 Alternatively, one can fit foregrounds over a large frequency 
range by using polynomials in logarithmic frequency space. As 
noted in |Bowman et aT] | |2009] l, however, one may want to avoid 
doing this because of the interferometric nature of most 21 cm ex- 
periments — since interferometers are not sensitive to the mean 
emission, negative values will be measured in certain pixels, mak- 
ing it problematic to take the logarithm. 

9 Aside from the omission of the noise term here, what follows is 



precisely the toy model introduced in Liu & Tegmark (2011 1 
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where C(y, v') is given by Equation 10 and 



-otp S -\-a a lnfy/z/*) 



(27) 



The resulting correlation function R(v, v') is given by 



R(v,v') — exp 



exp 



' 2 
iy-u') 
2u? 



f (lnz/ -In v' 

l\2-\ 



(28) 



where we have Taylor expanded about z/„ to obtain an 
unnormalized Gaussian with a coherence length of f c = 
v*/a a In i/*. Despite the fact that this expression was de- 
rived by only considering point sources, we find that it is 
an excellent approximation even when the full foreground 
model is used, provided one uses a longer coherence length 
to reflect the smoother Galactic synchrotron and free-free 
components (~ 64.8 MHz gives a good fit to the foreground 
model of Section |2J. 

Since R(y, u') now just depends on the difference v — u', 
the continuous analog of Equation [23] takes the form of a 
convolution: 



R(y - v')f n (v')dv = \ n f n (y) 



(29) 



where f n is the n th eigenforeground spectrum. Because con- 
volution kernels act multiplicatively in Fourier space, the 
/i's are a family of sinusoids. Indeed, plugging the ansatz 
fn(y) = sin(7 n i/ + (j>) into Equation 29 yields 



exp 



2v c 



sin(7„j/ + 4>)dv = A n sin(7„^ + . 



where the eigenvalue is given by 
\ n = v / 27n/f exp i 



-2^ C 7„J 



(30) 



(31) 



The eigenforegrounds shown in Figure [3] should therefore be 
thought of as a series of sinusoid-like functions that devi- 
ate from perfect sinusoidal behavior only because of "edge 
effects" arising from the finite frequency range of an exper- 
iment. As expected from the exponential form of Equation 
|31[ the modes quickly decrease in importance with increas- 
ing wavenumber 7„, and the more coherent the foregrounds 
(the larger v c is), the fewer eigenfunctions are needed to 
describe the foreground spectra to high accuracy. 

One discrepancy between the eigenvalue behavior in our 
analytic treatment and the numerical results earlier on in the 
section is its dependence on wavenumber j n . Analytically, 
Equation [31] predicts that 7„ should appear quadratically in 
the exponent, whereas numerically Figure [I] suggests some- 
thing closer to a linear relationship (provided one imagines 
that the mode number is roughly proportional to 7„, which 
should be a good approximation at high mode numbers). 
We find from numerical experimentation that part of this 
discrepancy is due to the finite frequency range edge effects, 
and that as one goes from a very small range to a very large 
range, the dependence on y n steepens from being linear in 
the exponential to being a power law in the exponential, 
with the index of the power law never exceeding 2. In any 
case, the role of Equation [3l] is simply to provide an intuitive 
understanding of the origin of the exponential fall drop in 
eigenvalues, and in the frequency ranges applicable to 21 cm 



tomography, we have verified numerically that a linear expo- 
nential fall-off fits the foreground model well. In particular, 
we can parametrize the eigenvalues with two parameters A 
and B, such that 



An ~ B exp {—An) . 



(32) 



and find that A = 6.44 and B = 5.98 x 10 5 fits the fore- 
ground model well for an instrument with a frequency range 
of 100 to 200 MHz and frequency bins of Au = 2 MHz. We 
caution, however, that these parameters vary with the fre- 
quency range, frequency bin width, and foreground model, 
and in general one must fit for A and B for each specific set 
of parameters. This is what we do to generate the numeri- 
cal results of the following subsection. Generically, though, 
A will always be somewhat larger than unity, so the fore- 
ground spectra will always be dominated by the first few 
eigenmodes. 



3.4 Eigenforeground Measurements 

So far, we have established that foreground spectra should 
be describable to a very high accuracy using only a small 
number of principal foreground components, with the im- 
portance of the k th foreground component quantified by the 
ak coefficient in a, which was defined in Equation [25] In a 
practical measurement, however, the presence of noise means 
that one does not know the true value of a, but instead must 
work with an estimator a that is formed from the data. 
There are many different ways to form this estimator, and 
one possibility would be to minimize the quantity 



X 



(y- VafN-^y-Va), 



(33) 



where N is defined as (nn*) using the noise vector n of Equa- 
tion 25 This least-squares minimization (which is optimal 



if the noise is Gaussian) yields 
sl LS = [V t N _1 V] _1 V t N" 



V" l y 



vV, 



(34) 



since the number of eigenmodes (i.e. the length of a LS ) is 
equal to the number of frequency channels (i.e. the length 
of y), so that V i s a square orthogonal matrix and V* = 
V -1 . Equation 34 states that even in the presence of noise, 
the least squares prescription calls for us to follow the same 
procedure as we would if there were no noise, namely, to 
take the dot product of both sides of Equation [25] with each 
eigenforeground vector. 

Defining an error vector e = 5a = a — a as the difference 
between the true a and the estimator a, we could instead 
choose to minimize the diagonal quantities (|ei| 2 ). This cor- 



responds to so-called Wiener filtering ( |Tegmark| 1997 1 , and 
the estimator is given by 



~ Wiener 

a 



SV l VSV 1 + N 



(35) 



where S = (aa*) is the signal covariance matrix for the 
vector a. 

In Sections |3. 4. 1| and |3. 4. 2| we examine Wiener filtering 
and least-squares minimization. We will find that the esti- 
mated foreground spectrum x = Va from Wiener filtering 
is expected to be closer to the true foreground spectrum, 
and that the least-squares method can be adapted to give 
similar results, although with errors larger than for Wiener 
filtering. 
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3.4.1 Wiener Filtering 

With the Wiener filtering technique, our estimate of the fore- 
ground spectrum is given by 

^Wiener = y-Wiener = ygyt ^gyt + N ] ~ X y = Wy 

(36) 

Understanding Wiener filtering thus comes down to un- 
derstanding the filter W. We first rewrite the expression 
slightly: 



■N 



VSTS + V'NVl 1 V t , 



(37) 

where we have made liberal use of the fact that V* = V 1 . 
From Section |2.6| we know that N is proportional to the 
identity if we use whitened units. In the notation of that 
section, we have N = k 2 I, so 



W 



vsrs + K 2 ii 1 v t . 



Now consider S: 

S = V'VSV^ = V t ((Va)(Va) t )V 
= V t (xx t )V = V'RV = A, 



(38) 



(39) 



where x denotes the true whitened foreground spectrum, 
as it did in Equation |25| In the penultimate step we used 
the fact that (xx*) is precisely our whitened foreground co- 
variance, i.e. R, and in the last step we used Equation |24| 
Inserting this result into our Wiener filter gives 



W 13 = VA [A + k 2 I] VM ..=^Vi 



/ I T ^— ^ ) SlmV t mj ■ 

Lm 



(40) 

where as before N c is the number of frequency channels 
(which is equal to the number of eigenforegrounds) . This 
in turn means that our estimator takes the form 



3 



A, 



<y*y)i 



]T V.,'rM" = E ^a^v^i/i), (41) 



wh ere Vj is the j th eigenforeground as defined in Equation 

Wiener 
s 1 for 



23 



and Wj = Xj/(Xj + K ) can be thought of as 



weights" for the j eigenforeground. Since Wj 



preserves eigenmodes that have high signal to noise while 
suppressing those that have low signal to noise. In words, 
the Wiener filtering procedure thus amounts to first per- 
forming a least-squares fit to obtain estimates for the eigen- 
foreground coefficients, but then in the reconstruction of the 
foreground spectrum, downweighting modes that were only 
measured with low signal-to-noise. 

In Figure[5]we show the Wiener weights for the first ten 
eigenmodes for an experiment with 50 frequency channels, 
equally spaced from 100 MHz to 200 MHz, and noise levels 
given by the fiducial model of Section |2.6| The weights are 
seen to be small after the first few eigenmodes, suggesting 
that most of the information in our foreground spectrum es- 
timate comes from a mere handful of parameters. The other 
parameters (i.e. eigenforeground coefficients) are too noisy 
to have much constraining power. Since the Wiener weights 
tend to unity in the limit of large signal-to-noise, we can 
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Figure 5. First few Wiener weights (defined as w = A/(A + k 2 ), 
where A is the eigenvalue of a foreground mode and k is the 
whitened noise) for an experiment with 50 frequency channels, 
equally spaced from 100 MHz to 200 MHz, and noise levels given 
by the fiducial model of Section |2.6| 



define an effective number of measurable parameters n e fi by 
summing the Wiener weights: 



(42) 



For the fiducial scenario in Figure [5] the numerical value of 
n e g is 4.06. 

In general, n e ff depends on both the nature of the fore- 
grounds and the noise level. If the noise levels were high, 
the difficulty in measuring the higher (and therefore sub- 
dominant) eigenmodes would result in those measurements 
being noise dominated. The Wiener filter would thus sup- 
press such modes, driving n e ff down. In the opposite limit 
where the noise levels are low, one would expect n e fi to 
be higher, but still to be relatively small. This is because 
we know from Section |3.2| that the foreground spectra are 
relatively simple functions that are dominated by the first 
few eigenmodes, and thus even if the noise levels were low 
enough for the higher modes to be measurable, it is unnec- 
essary to give them very much weight. In general, n c ff varies 
roughly logarithmically with increasing foreground-to-noise 
ratio since the foreground eigenvalues drop exponentially. 
This nicely explains one of the results from |Liu &i Tegmark| 
(20111. There it was found that in performing foreground 



subtraction by deweighting foreground contaminated line- 
of-sight Fourier modes, the number of modes that needed 
to be deweighted increased only logarithmically with the 
foreground-to-noise ratio. We now see that this is simply a 
consequence of Equation |32| 

In Figure [6] the red/grey dotted line shows n e a as a 
function of the total frequency range of an experiment. In 
changing the frequency range, we keep the integration time 
and the channel width Av constant so that the noise level 
remains unchanged. Any change in n Gtt - therefore reflects 
the nature of the foregrounds, and what one sees in Fig- 
ure [6] is that as the total frequency range increases, more 
and more modes are needed when estimating the spectrum. 
This is because effects like the spread in the spectral indices 
are apparent only over large frequency ranges, so increasing 
the frequency range makes foregrounds more complicated to 
model. 
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spectrum, our mean-square measurement error becomes 
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Figure 6. Shown with the red/grey dashed curve, the effective 
number n e ff of foreground parameters used (defined by Equation 
|4L'[ ) in the Wiener filtering method. In solid black is the optimal m 
(Equation [48] adjusted for the fact that m must be an integer) for 
the truncated least-squares method. In both cases the behavior is 
shown as a function of the total frequency range of an instrument, 
with channel width and integration time (and thus noise level) 
held constant at 1 MHz and 1000 hrs, respectively. 



To quantify the success of a foreground model gener- 
ated from Wiener filtered measurements, we consider the 
quantity 5x™ = x wicllcr - x: 

<5x™ = ( w j ^ S - a J ) 

3 

= Vy(wj - l)aj - Y VijWjVkjiik 



3 



3,k 



J2( w 3 ~ r;a v X W 'i n v 



where in the penultimate step we substituted a 1 



(43) 



a + 



25 



and 



34 1, and in the 



V*!! (which follows from Equations 
ultimate step recognized that ^jVijWjVkj is the Wiener 
filter W of Equation |40| The covariance of this quantity is 

N c 

(5x w 5xl) = £(a iai )(toi - l)(ity - l) Vi v* + W(nn t )W t 

i,j 
N c 

= \i(wi - l) 2 v,v* + WNW' 

i 

= Y M w » _ !) 2v » v i + ^ Y ^ ViV » ( 44 ) 

i i 

where in the penultimate step we used (a^a^) = Sy 
A = XiSij, and in the last step used Equation [40] as well 
as N = k 2 I. Note that there are no cross terms in going 
from Equation [43] to [44] because such terms would be pro- 
portional to {^2 i aiViii ) = (xn'), which is assumed to be 
zero because the foregrounds and the noise are uncorrelated. 
The diagonal terms in Equation [44] give the expected mean- 
square measurement error at a particular frequency. Averag- 
ing over frequency to obtain a figure-of-merit for the entire 



£ Wiener = (5x™5x^) 



2 N c N c 



N, 



(45) 



where in the last step we permuted the sums and used the 
fact that the eigenforeground vectors are normalized. From 
this expression, we can see that there are essentially two 
sources of error, each represented by one of the terms in 
Equation [45] The first term is only significant for the higher 
order eigenmodes, where Wi ~ 0. With each element in 
the sum proportional to Xi, this term quantifies the error 
incurred by heavily de-weighting the higher order modes 
(which are de-weighted so much that they are essentially 
omitted from the estimator). Of course, we have no choice 
but to omit these modes, because their measurements are 
so noisy that there is essentially no foreground information 
contained in them. The second term is significant only for 
the first few eigenforegrounds, which have Wi ~ 1. Being 
proportional to k 2 , this term quantifies the error induced by 
instrumental noise in the measurement of the lower order 
modes (which are included in the estimator). 

In Figure[7] we show (with the red/grey dashed line) the 
fractional error in an estimate of a foreground spectrum as a 
function of the total frequency range of an instrument. The 
channel width Av and integration time At are held constant, 
so there is no change in the noise level. From the plot, we 
see that as the frequency range of the instrument increases, 
one is able to construct an increasingly accurate estimator 
of the foreground spectrum. To understand why this is so, 
note that as the frequency range increases, there are two 
effects at play — first, a larger frequency range means more 
complicated foregrounds, possibly increasing the errors in 
the fits; however, a larger frequency range with fixed chan- 
nel width results in more data points to fit to, as well as 
a longer "lever arm" with which to probe spectral features, 
thus reducing the errors. What we see in Figure[7]is that the 
second outweighs the first. This is unsurprising, since we saw 
from Figure [6] that as the frequency range is increased, the 
number of modes required to describe the foregrounds in- 
creases rather slowly, suggesting that the foreground spectra 
are only getting marginally more complicated. 



3.4-2 Truncated Least-Squares 

In a conventional least-squares fitting of a foreground spec- 
trum, one forms an estimator for the spectrum by taking 
a L and multiplying by V. In other words, one fits for all the 
eigenforeground coefficients (i.e. the vector a) and multiplies 
each coefficient by the spectrum of the corresponding eigen- 
foreground. This, however, is a suboptimal procedure for es- 
timating the true spectrum because the higher order eigen- 
modes have very low signal-to-noise and measurements of 
them tend to be noise dominated. Such modes should there- 
fore be excluded (or heavily downweighted) , which was what 
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Figure 7. Expected error on measured foregrounds divided by 
the root-mean-square (r.m.s.) foreground intensity to give a frac- 
tional foreground modeling error. In dashed red/grey is the error 
for the Wiener filtering of Section |3.4.1| (the square root of Equa- 
tion [45] divided by the r.m.s.). In solid black is the error for the 
truncated least-squares method of Section |3. 4. 2| (Equation |49| di- 
vided by the r.m.s. and adjusted for the fact that m must be an 
integer). In both cases we have plotted the errors on a log-log scale 
as functions of the total frequency range for an instrument with 
a fixed 1 MHz channel width and 1000 hrs of integration time, en- 
suring a constant noise level. The black dotted lines are included 
for reference, and are proportional to 1/ \fN c Av. 



the Wiener filtering of the previous section accomplished. In 
this section we explore a simpler method where we measure 
all eigenforeground coefficients, but include only the first m 
eigenmodes in our estimate of the spectrum, truncating the 
eigenmode expansion so that the noisier measurements are 
excluded. 

In this truncated least-squares scheme, our estimate of 
the spectrum for the i th frequency channel takes the form 



x, LS 



E 



~LS 1 \ 



(46) 



where m is an integer to be determined later. Written in this 
way, we see that we can reuse all the expressions derived in 
the previous section as long as we set Wi = 1 for i m 
and Wi = for i > m. Put another way, the truncated-least 
squares method approximates Wiener filtering by replacing 
the plot of Wiener weights shown in Figure [5] with a step 
function. Using Equation |45| we thus see that the mean- 
square error for the truncated least-squares method is 



2 



^I>(^-l) 2 + ^£ 



N, 
1 

K 

B 



=1 



i — m + 1 



exp [— A(m + 1)] + rn- 



(47) 



N c " n Nc 

where in the last step we used the fact that the eigenvalues 



fall off in a steep exponential (Equation 32 1 that decays 



quickly enough that we can to a good approximation omit 
all but the first term in the sum. 

From Equation |47| we see that the truncated least- 
squares method gives large errors for extreme values of m, 
both large and small — at large m, many modes are in- 



cluded, making the exponential term in the error small, but 
at the cost of large noise contamination from the second 
term; at low m, the noise term is small, but too few eigen- 
foregrounds are included in the estimator of the spectrum, 
and the first term is large. To obtain the best possible error 
from the truncated least-squares method, we must therefore 
choose an intermediate value of m that minimizes e\ s . Dif- 
ferentiating with respect to m and setting the result to zero 
allows one to solve for the optimal m: 



m op t 



(48) 



and inserting this into our expression for the error gives a 
best error of 



best 

£ls 



VN c AisAt 



hi 



/ ABy/AtAi 



"fid 



+ 1 



(49) 



where we have made the substitution k — > Kfid/V AtAv to 
emphasize the scalings with integration time At and channel 
width Av. Note that Equation [49] is only approximate, since 
m must take on an integer value. 

In Figure [6] the solid black curve shows a func- 

tion of Nc Av, fitting numerically for A and B (Equation 32 1 
for each N c Av. The channel width Av and At are held at 
1MHz and 1000 hrs respectively, as described in Section [2.6| 
Since Av and At axe constant, the normalized noise level k 
is also constant, and the variation in m opt is due purely 
to changes in A and B. This variation is seen to be quite 
weak, and we see that the optimal number of components 
to solve for is always a rather small number, and that this 
number increases only slowly with the number of channels 
N c (or equivalently with the total frequency range N c Av, 
since the channel width Av is being held constant). This is 
unsurprising, given that the first term in Equation [47] de- 
cays exponentially while the second term rises only linearly, 
forcing the optimal m towards small m. With any current- 
generation experiment, we find m op t to be no larger than 4. 
Also note that m opt behaves like a discretized version of n e ff 
from the Wiener filtering, as is expected. 

In Figure [7] we show how the optimal m values of Fig- 
ure [fj] translate into the error Etotai > again as a function of 
N c Av. The error is divided by the root-mean-square value 
of the foregrounds over the entire spectrum to give a rough 
percentage error, and is plotted as the solid black line, while 
reference lines proportional to l/^/N c Av are shown in dot- 
ted black. The error is seen to decrease mostly with the in- 
verse square root of the total frequency range N c Av, as one 
would expect from the prefactor in Equation [49] The devia- 
tion from this behavior is due to the fact that the second part 
of Equation[49]depends on A and B, which in turn vary with 
N c . This, however, is a rather weak effect, since the relevant 
parts of Equation [49] look a lot like our expression for m opt , 
which we know from Figure [6] rises slowly. The result is that 
deviations from a 1/ y/N c Av scaling occur only when one is 
close to the transition in m op t (which remember, must be 
an integer). Again, we see that that truncated least-squares 
method closely approximates the Wiener filtering method, 
with only slightly larger errors. 

With both methods, we find that there is very little 
change in the error if one changes the channel width Av 
while correspondingly adjusting iV c so that N c Av (i.e. the 
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total frequency range of the instrument) is kept constant. 
For the truncated least-squares method, such changes keep 
the prefactor in Equat ion 1 49 1 fixed , and any variations in the 
error are due purely to the corrections in the second part 
of the equation. Numerically, the lack of sharp spectral fea- 
tures in the foregrounds means that these corrections are 
found to be completely subdominant. This implies that to 
an excellent approximation, the errors in our measured fore- 
grounds are dependent only on the total frequency range 
of our instrument, and are independent of how we bin our 
data — binning more coarsely results in a lower noise per 
frequency bin, but this is canceled out (to a very high pre- 
cision) by the larger number of bins with which to perform 
our ntsFl From this, we see that the decrease in errors seen 
in Figure [jj (where we increased the number of data points 
by increasing the total frequency range of the instrument) 
is due not to intrinsically better fits, but rather to a longer 
"lever arm" over which to probe foreground characteristics. 



3.4-3 Which method to use? 

In summary, we can see from Figure [6] that it is indeed the 
case that 21 cm foregrounds can be accurately characterized 
using just a small number (~ 3 or 4) of independent com- 
ponents. From Figure [7] we see that both Wiener filtering 
and the truncated least-squares method allow foregrounds 
to be estimated to an accuracy of roughly one part in 1CF 5 
to 10~ 6 . This is fortunate since the cosmological signal is 
expected to be ~ 10 -4 smaller than the foregrounds, so a 
failure to reach at least that level of precision would ruin 
any prospects of a cosmological measurement. 

With the Wiener filtering and the truncated least- 
squares giving such similar results, for most applications it 
should not matter which method is used. However, if one 
requires the very best foreground model achievable, then 
Wiener filtering should be used, since it was derived by 
minimizing the error. On the other hand, the least-squares 
method has the advantage of being simpler, and in addition 
is more immune to inaccuracies in foreground modeling. This 
is because Wiener filtering explicitly involves the foreground 
covariance S, and for the method to minimize the modeling 
error, it is important that the foreground-to-noise ratio be 
accurate. On the other hand, the foregrounds only enter the 
least squares method via the choice of basis (i.e. through 
V), and since this basis spans the vector space, the role of 
the foreground model in least-squares fitting is simply that 
of a prior model. Ultimately, though, the Wiener filtering 
method's need for an estimate of the foreground covariance 
is unlikely to be an obstacle in practice. As demonstrated in 
|de Oliveira-Costa et al.| |2008), it is possible to derive a prin- 
cipal component basis (and the corresponding eigenvalues) 
empirically. 



4 THE DIFFICULTY IN MEASURING 
PHYSICAL PARAMETERS 

In the previous section, we saw that the foregrounds can 
be accurately described using only a small number of fore- 
ground eigenmodes. Put another way, there are very few 
distinguishing features in radio foreground spectra, so there 
are really only a few independent parameters in the data. In 
this section, we will see that this places severe constraints 
on our ability to measure the physical parameters listed in 
Tabled] 

We employ a Fisher matrix formalism to estimate the 
best possible error bars on measurements of these parame- 
ters. To do so, we imagine that one has already used Equa- 
tion [34] to compute an estimator a of the expansion coeffi- 
cient vector from the datj* 1 ") The precise value of this esti- 
mator will vary because of the random nature of instrumen- 
tal noise, whose effect can be quantified by computing the 
error covariance matrix of a. Since both the Wiener filtering 
method and the truncated least-squares method require as a 
first step a full least-squares estimation of a, we can use the 
standard formula for the error covariance E for least-squares 
fitting ( ]Tegmark|19"97| >: 

= K 2 !, (50) 



E=([a-(a)][a-(a)] t ) = [V t N- 1 V] 



where like before we used the fact that in our whitened units, 
N = k 2 I. Once again, the angle brackets (...) denote an 
ensemble average, or (equivalently from the standpoint of 
practical observations) an average over many independent 
lines-of-sight. With this covariance, the probability distri- 
bution L of a is given by 



L(a;0) 



x /(27r) JV -detS 



-V 

2 L 



oxp -- a-(a}]«E- l [fi-<a}] 

(51) 

where E is the error covariance matrix, and = 
62, 03, ■ ■ - ) is a vector of model parameters (such as the 
parameters in Table [TJ, which enters our expression because 
the true expansion coefficient vector a depends on these pa- 
rameters. Interpreted as a function of for a given mea- 
surement a, L is the so-called likelihood function for the 
parameters that we wish to constrain. The tightness of our 
constraint is closely related to the Fisher information ma- 
trix, which is defined as 



/ d 2 C 



\ de A de L 



(52) 



where C = - 
unbiased, i.e. 



-lnL. If our estimator for the parameters is 



(0) = ©o 



(53) 



where 0o is the true parameter vector, then the Cramer- 
Rao inequality states that the error bars on 6a (defined by 



10 This is provided one makes the (usually excellent) assumption 
that the number of frequency bins is much larger than the number 
of independent foreground modes, seen in this section and the 
previous one to be 3 or 4. 



Instead of considering the expansion coefficients, one could in- 
stead deal with the foreground spectrum x and its estimator x di- 
rectly. However, unlike for the expansion coefficients, the spectra 
themselves have the unfortunate property of having non-diagonal 
error covariances (e.g. Equation |43| for Wiener filtering), which 
makes the subsequent analysis and interpretation in this section 
more cumbersome. 
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Figure 8. Parameter derivative vectors s A (Equation p9t for 7, 
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Figure 9. Parameter derivative vectors s A (Equation \59\ for B, 
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the standard deviation A9 A = \ZWaT 



(8a) 2 ) satisfy 



A9 A 5s (F- 1 ) 



1/2 

A .4 



(54) 



if we estimate all the parameters jointly from the data. Com- 
puting the Fisher matrix thus allows us to estimate the abil- 
ity of an experiment to constrain physical parameters, with 
the covariance between the parameter estimates equal to 
F _1 if the data analysis is done in an optimal fashion. 

Let us now compute the Fisher matrix for the fore- 
ground parameters listed in Table [T] Inserting Equation |51| 
into Equa tion |52| and perf orming some matrix algebra, one 
can show (Teg mark et al.||l997| ) that the Fisher matrix re- 
duces to 



F 



rTr 



+ 



,-i9(a) 



de A 



de L 



(55) 



dO A d9 B _ 

Referring back to our expression for X, we see that the first 
term vanishes because S depends only on the noise level and 
not on the physical parameters. This gives 

' d(B t )\( d^Y 



de A 



(56) 



Note that this expression depends on the mean vector (a), 
not the estimator a. This is because the Fisher matrix for- 
malism tells us what the error bars are for the optimal 
method, whatever that method happens to be. We thus do 
not expect a to appear in Fab, for if it did we would have 
the freedom to plug in a possibly sub-optimal estimator of 
our choosing, and by construction the Fisher matrix formal- 
ism contains information about the optimal errors. 

With (a) signifying the true expansion coefficient vec- 
tor, we know from Equations |2| and |25| that 



(a) = V 4 (x) 



(57) 



Inserting this into our expression for the Fisher matrix, we 
obtairF] 



1 ^m^,,,^ /9m 



= ~^S A ■ SB, 
K 



(58) 



where we have used the fact that V* = V 1 since the eigen- 
foregrounds are orthogonal, and have defined 



sa 



dm 

del' 



(59) 



Each component of Equation [58] can be geometrically inter- 
preted as a dot product between two s vectors in an N c - 
dimensional space. As m encodes the whitened foreground 



spectrum (Equation 22 1 , each s vector quantifies the change 
in the expected foreground spectrum with respect to a phys- 
ical parameter. And since the final covariance matrix on the 
parameter errors is given by F" 1 , the greater the dot prod- 
uct between two different s vectors, the more difficult it is 
to measure the two corresponding physical parameters. The 
forms of different sa's thus give intuition for the degenera- 
cies in a set of parameters. 

Shown in Figures [5] and [5] are plots of the s A vectors for 
the foreground parameters shown in Table [T] For clarity, we 
have separated the derivatives into two plots that have differ- 
ent vertical scales. Many of the parameters derivatives have 
similar shapes, suggesting that they will have a large "dot 
product" in Equation [58] and therefore large degeneracies 
between them. Note that the overall normalization of the 
curves is irrelevant as far as degeneracies are concerned. This 
is because two parameters with identically shaped curves 
but different normalizations are still completely degenerate 
as one can perfectly compensate for changes in one param- 
eter with smaller (or larger) changes in the other. Thus, to 
quantify the degree of degeneracy, we can form a matrix 
of normalized dot products between the s A vectors, where 
the magnitude of each vector is individually normalized to 
unity. The results are shown in Table [2] where we see that 
the parameters form three degenerate groups — the normal- 
ization parameters (B, 7, A sync , Aff), forming a degenerate 
4x4 block of ~l's in the top left corner; the spectral index 
parameters (a ps , a 3ync , aff), forming a degenerate block in 
the middle; and the frequency coherence parameters (a a , 
Aa 3ync , Aa//), forming a degenerate block in the bottom 
right corner. Indeed, computing the eigenvalues of the nor- 
malized Fisher matrix (i.e. of Table [5|, one finds only three 
eigenvalues of order unity and higher, with the next largest 
eigenvalue of order 1CP 3 . This suggests that there are really 
only three independent foreground parameters that can be 
measured. 

To gain an intuition for what these independent param- 
eters quantify, consider the first few eigenvectors of the nor- 
malized Fisher matrix F shown in Table [ij that is, vectors 
that satisfy 

FO'n = A„C (60) 
These parameter eigenvectors are listed in Table [3] where 



12 We have implicitly assumed that the whitening procedure per- 
formed in Equations |22| was with respect to some fiducial fore- 
ground model, so that V does not depend on the foreground pa- 
rameter vector <~>. 
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Table 2. Dimensionless dot products between vectors (Equation |59| l for the foreground parameters listed in Table [T| or equivalently, 
a normalized version of the Fisher matrix (given by Equation |58| l so that the diagonal elements are unity. The derivatives were evaluated 
at the fiducial foreground parameters for the foreground model described in Section[2] The frequency range of the experiment was taken 
to go from 100 MHz to 200 MHz, with 50 equally spaced channels. The matrix is symmetric by construction, so the bottom left half has 
been omitted for clarity. 
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Table 3. Eigenvectors (Equation |60[ l of the normalized Fisher matrix (Tablc[2jl. Each row represents an eigenvector, and going from top 
to bottom the eigenvectors are arranged in descending value of eigenvalue. 
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each row gives the weighted average of the original param- 
eters that one must take to form the "eigenparameters" . 
To see what foreground parameters are well characterized 
by 21 cm tomography, we look at the first few eigenvectors, 
which can be measured with the highest signal-to-noise ra- 
tio: 



■ a ff) 



0A(B + 7 + a a + A 3ync + Aff + A.a 3ync + A 
+0.1(a pa + aff) + 0.2a aync 
Q.2(B + 7 + Aff + A sync ) - 0.1((Ta + Aa aync ) 
-0.5(a ps + a aync + aff) - 0.02Aa// 
0.3(S + 7 + A aync + A ff ) + 0.2(a ps + a ff ) + 0.1 
— 0.4(a a + Aa sync ) — 0.5Aa//, 



(61) 



where for clarity the coefficients have been rounded to one 
significant figure (see Table [3] for more precise values). The 
first eigenparameter weights all the parameters equally ex- 
cept for the spectral indices, which are somewhat down- 
weighted. The first eigenparameter is thus roughly an "ev- 
erything but spectral indices" parameter. Crudely speaking, 
this parameter is a generalized normalization parameter be- 
cause it is mostly comprised of normalization parameters (B, 
7, A sync , Aff) and frequency coherence parameters, whose 
spectral effects are not apparent until the frequency range is 
large. Examining the third eigenvector, we see that it is most 
heavily weighted towards the frequency coherence parame- 
ters, suggesting that we do have an independent parameter 
that acts as a "generalized frequency coherence". However, 
being the third eigenvector, our ability to make measure- 



ments of it is lower than with the first eigenvector or the 
second eigenvector, which is a "generalized spectral index" . 

To see which degrees of freedom we cannot constrain in 
our foreground model, we consider the last couple of eigen- 
parameters: 



6»g « 0.18B + 0.II7 - Q.82A a ,, nc + 0.52A// 
9 10 « 0.7B-0.717, 



(62) 



where we have omitted all terms with coefficients less than 
0.1. The last eigenvector is dominated by B and 7, which 
appear with almost equal and opposite magnitudes. This 
suggests that differences between B and 7 are extremely 
hard to measure. In the penultimate eigenvector, the sum of 
coefficients for B, 7, and Aff is roughly equal and opposite 
to the coefficient of A sync . This implies that the collective 
difference between A sync and the linear combination of B, 
7, and Aff is also difficult to tease out from the data, but 
less so than the difference between B and 7. 

That we can only measure a small number of fore- 
ground parameters independently is perhaps unsurprising, 
given that in Figure [I] we saw that most of the foreground 
power comes from a small number of eigenforeground modes 
that lack distinctive features. Note that since « enters our ex- 
pression for the foreground parameter Fisher matrix (Equa- 
tion 581 as an overall multiplicative constant, instrumental 
noise has no effect on the foreground parameter degeneracies 
we examined in this section. The parameter degeneracies are 
due purely to the form of the foregrounds, and have noth- 
ing to do with the noise. On the other hand, when quan- 
tifying the effective number of parameters in the Wiener 
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filter (Equation 42 1 or when solving for the optimal num- 



ber of eigenforeground modes to measure in the truncated 



least-squares method (Equation 48 1, our expressions explic- 
itly depend on the noise level. For instance, in the limit 
of a noiseless experiment the best characterization of fore- 
grounds is obtained by measuring all the eigenmodes, not 
the three or four modes found in Sections 13.4.11 and 13.4.21 
In general, however, the results of this section show that 
one must be able to measure at least three eigenmodes at a 
reasonable signal-to-noise in order to adequately model the 
foreground spectrum. If Equation [42] or Equation [48] suggest 
that fewer eigenmodes should be used in the model, it is 
simply because one's experiment is too noisy to allow the 
foregrounds to be measured accurately. 



5 CONCLUSIONS 

In this paper, we have shown that despite the complicated 
dependence that low-frequency radio foregrounds may have 
on physical parameters, the resulting spectra are always 
rather generic and featureless. The bad news from this came 
in Section [4] where we saw that even the most careful fore- 
ground spectrum measurements are unlikely to yield inter- 
esting constraints on foreground physics, thanks to high lev- 
els of degeneracies between different foreground parameters. 
This, however, is good news for those who simply consider 
foregrounds an impediment to 21 cm tomography. In Sec- 
tion [3] we saw that it was precisely because the foreground 
spectra were so featureless that they could be characterized 
to a greater accuracy than necessary for foreground sub- 
traction using only three or four independent parameters. 
This bodes well for 21 cm astrophysics and cosmology, for it 
suggests that extremely detailed and physically motivated 
foreground models will not be necessary for successful fore- 
ground cleaning. 
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